A Finite-state Morphological Analyser for Tuvan
نویسندگان
چکیده
This paper describes the development of free/open-source finite-state morphological transducers for Tuvan, a Turkic language spoken in and around the Tuvan Republic in Russia. The finite-state toolkit used for the work is the Helsinki Finite-State Toolkit (HFST), we use the lexc formalism for modelling the morphotactics and twol formalism for modelling morphophonological alternations. We present a novel description of the morphological combinatorics of pseudo-derivational morphemes in Tuvan. An evaluation is presented which shows that the transducer has a reasonable coverage—around 93%—on freely-available corpora of the languages, and high precision—over 99%—on a manually verified test set.
منابع مشابه
Fast Morphological Analysis of Czech
This paper presents a new Czech morphological analyser which takes an advantage of Jan Daciuk’s algorithms for minimal deterministic acyclic finite state automata. The new analyser is six times faster than the current analyser ajka concerning the proper analysis, i.e. returning possible lemmata and tags for a given word form, but for some other related tasks is the difference even bigger.
متن کاملA Two-Level Morphological Analyser for the Indonesian Language
This paper presents our efforts at developing an Indonesian morphological analyser that provides a detailed analysis of the rich affixation process. We model Indonesian morphology using a two-level morphology approach, decomposing the process into a set of morphotactic and morphophonemic rules. These rules are modelled as a network of finite state transducers and implemented using xfst and lexc...
متن کاملA Morphological Analyser for Machine Translation Based on Finite-state Transducers
A finite-state, rule-based morphological analyser is presented here, within the framework of machine translation system TAVAL. This morphological analyser introduces specific features which are particularly useful for translation, such as the detection and morphological tagging of word groups that act as a single lexical unit for translation purposes. The case where words in one such group are ...
متن کاملModularisation of Finnish Finite-State Language Description - Towards Wide Collaboration in Open Source Development of a Morphological Analyser
In this paper we present an open source implementation for Finnish morphological parser. We shortly evaluate it against contemporary criticism towards monolithic and unmaintainable finite-state language description. We use it to demonstrate way of writing finite-state language description that is used for varying set of projects, that typically need morphological analyser, such as POS tagging, ...
متن کاملA Finite State Approach to Abkhaz Morphology and Stress
The West Caucasian language Abkhaz is characterized by a rich but rather regular agglutinative morphology. Word stress, however, is free and dynamic and difficult to predict. A theory of stress in Abkhaz has been developed by V. Dybo, A. Spruit and L. Trigo which predicts word stress correctly in the majority of cases. Although stress is not orthographically marked, its position determines the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016